Model Selection

vLLM Optimization

# vLLM Optimization

Qwq 32B INT8 W8A8

INT8 quantized version of QWQ-32B, optimized by reducing the bit-width of weights and activations

Large Language Model

Transformers English

Whisper Large V3.w4a16

This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.

Speech Recognition

Transformers English

Qwen2.5 VL 3B Instruct Quantized.w8a8

Quantized version of Qwen/Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, with weights quantized to INT8 and activations quantized to INT8.

Transformers English

Pixtral 12b FP8 Dynamic

pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.

Safetensors Supports Multiple Languages

Deepseek Coder V2 Lite Instruct FP8

FP8 quantized version of DeepSeek-Coder-V2-Lite-Instruct, suitable for commercial and research use in English, optimized for inference efficiency.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase